-
Notifications
You must be signed in to change notification settings - Fork 0
Support newer versions of SpatialData with zarr v3 stores #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…& 0.6, also proxy server script, workflow, vscode settings noise
remove some dead code/comments.
|
When opening an image from a v0.6.1 dataset:
I'm not sure why |
try to more thoroughly avoid issues with trailing slash in url.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds Zarr v2/v3 detection and normalization, a consolidated tree-backed store parser and Zod-based Zarr validators, per-version Python fixture generation and validation tooling, local test server/CORS proxy, CI workflow, tests, and several package dependency/type updates across the monorepo. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Open as openExtraConsolidated
participant Try as tryConsolidated
participant Remote as RemoteStore
participant Norm as ValidatorNormalizer
participant Parser as parseStoreContents
participant Tree as ZarrTree
Client->>Open: openExtraConsolidated(source)
Open->>Try: tryConsolidated(store)
Try->>Remote: fetch zarr.json
alt v3 present
Remote-->>Norm: v3 consolidated_metadata
Norm->>Parser: validateV3Zarray & build nodes
else v3 absent
Try->>Remote: fetch .zmetadata / zmetadata (v2)
Remote-->>Norm: v2 metadata
Norm->>Norm: validateAndConvertV2Zarray -> v3-normalized
Norm->>Parser: normalized metadata
end
Parser->>Tree: build path-based ZarrTree
Parser-->>Open: return { zarritaStore, tree }
Open-->>Client: Ok({zarritaStore, tree})
sequenceDiagram
participant User
participant CLI as pnpm
participant Orch as python/scripts/generate_fixtures.py
participant UV as uv
participant Gen050 as python/v0.5.0/generate_fixtures.py
participant Gen061 as python/v0.6.1/generate_fixtures.py
participant Gen070 as python/v0.7.0/generate_fixtures.py
participant Out as test-fixtures
User->>CLI: run test:fixtures:generate --all
CLI->>Orch: start orchestration
Orch->>UV: uv sync v0.5.0
UV-->>Gen050: run generator
Gen050->>Out: write v0.5.0/blobs.zarr
Orch->>UV: uv sync v0.6.1
UV-->>Gen061: run generator
Gen061->>Out: write v0.6.1/blobs.zarr
Orch->>UV: uv sync v0.7.0
UV-->>Gen070: run generator
Gen070->>Out: write v0.7.0/blobs.zarr
Orch-->>User: aggregated success / exit code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (33)
vitest.config.ts (1)
1-24: Ensure__dirnameworks with your Vitest module formatIf your project is using ESM (e.g.
"type": "module"inpackage.jsonor TS compiled as ESM),__dirnamewon’t be defined invitest.config.ts, which would break test startup. In that case, consider computing it fromimport.meta.url:-import { defineConfig } from 'vitest/config'; -import { resolve } from 'node:path'; - -export default defineConfig({ +import { defineConfig } from 'vitest/config'; +import { resolve, dirname } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); + +export default defineConfig({ test: { @@ resolve: { alias: { '@spatialdata/core': resolve(__dirname, 'packages/core/src'), '@spatialdata/zarrextra': resolve(__dirname, 'packages/zarrextra/src'), }, }, });If the repo is still CJS-only, the existing
__dirnameusage is fine, but it’s worth confirming the module type to avoid subtle config failures.packages/core/src/store/index.ts (1)
82-119: Consider a safertoJSONfallback than returningthis
toJSON()currently returns the instance itself whenthis.rootStore.treeis falsy, which can be surprising and risks odd serialization behavior (or recursion) ifJSON.stringifyis used onSpatialData. SinceConsolidatedStore.treeshould normally exist, a minimal or empty fallback is safer:toJSON() { - if (!this.rootStore.tree) return this; - return serializeZarrTree(this.rootStore.tree); + if (!this.rootStore.tree) { + // Fallback to a minimal representation instead of the full instance + return { url: this.url }; + } + return serializeZarrTree(this.rootStore.tree); }The new
toString()with a try/catch looks good and should make debugging bad metadata less painful.packages/zarrextra/src/types.ts (1)
45-135: Align comments/types with v3-normalized metadata and consider wideningfill_valueYou now normalize v2
.zmetadatato the v3-shapedZarrV3Metadataand typeIntermediateConsolidatedStoreasStore & { zmetadata: ZarrV3Metadata }, which is a clear internal contract; the surrounding comments about a v2/v3 union andConsolidatedMetadata/MetadataFormatcould be updated or trimmed so future readers don’t expect both shapes to appear at runtime.Also,
ZarrV3ArrayNode.fill_valueis restricted tonumber | string | boolean, but real Zarr stores may usenullor more complex types. If you ever surfacefill_valuebeyond debugging, you might want a looser type (e.g.unknownornumber | string | boolean | null) to better reflect what can be in the metadata.packages/zarrextra/src/index.ts (2)
15-101: Use per-level paths for attribute lookup when building the treeIn
parseStoreContents, you currently call:const attrs = getZattrs(normalizedPath as zarr.AbsolutePath, store);inside the loop over
pathParts. This means that when you synthesize intermediate group nodes for paths that don’t have their own metadata entries, they’ll look up attributes using the full leaf path and can accidentally inherit the leaf array’s attributes.You can make attribute resolution more robust for both groups and arrays by computing the path prefix for the node you’re creating:
- for (const [i, part] of pathParts.entries()) { + for (const [i, part] of pathParts.entries()) { if (!(part in currentNode)) { const isLeaf = i === pathParts.length - 1; const isArray = isLeaf && info.isArray; - - // Get attributes for this path (normalizedPath already has leading /) - const attrs = getZattrs(normalizedPath as zarr.AbsolutePath, store); + // Get attributes for this node using its own path prefix + const currentPath = `/${pathParts.slice(0, i + 1).join('/')}`; + const attrs = getZattrs(currentPath as zarr.AbsolutePath, store); @@ currentNode[part] = { [ATTRS_KEY]: attrs, [ZARRAY_KEY]: zarray, get: () => zarr.open(root.resolve(normalizedPath), { kind: 'array' }) };This keeps array lookups unchanged while ensuring group nodes only ever consult their own metadata paths.
134-224: Preserve v2dtypewhen normalizing to v3data_typeIn
normalizeV2ToV3Metadata, array nodes defaultdata_typeto'float64'unlesszarray.data_typeexists:data_type: (zarray.data_type as string) || 'float64',For v2
.zarraymetadata, the field is typicallydtype, notdata_type, so you’ll silently treat all such arrays asfloat64in your normalized metadata even when the underlying dtype is different. If any downstream logic introspectsdata_type, it will see incorrect values.You can cheaply preserve the v2 information by falling back to
dtype:- if (group.zarray) { - // Array node - use zarray metadata as the base - const zarray = group.zarray as Record<string, unknown>; - metadata[path] = { - shape: (zarray.shape as number[]) || [], - data_type: (zarray.data_type as string) || 'float64', + if (group.zarray) { + // Array node - use zarray metadata as the base + const zarray = group.zarray as Record<string, unknown>; + const dtype = (zarray.data_type ?? zarray.dtype) as string | undefined; + metadata[path] = { + shape: (zarray.shape as number[]) || [], + data_type: dtype || 'float64', @@ - zarr_format: (zarray.zarr_format as number) || 3, + zarr_format: (zarray.zarr_format as number) || 3,This keeps the internal v3-like shape while more accurately reflecting v2 array metadata.
.vscode/settings.json (1)
1-14: Consider whether to commit IDE settings.Committing
.vscode/settings.jsondirectly can cause conflicts when team members have different setups (e.g., different Python versions or custom extensions). A common practice is to either:
- Add
.vscode/settings.jsonto.gitignoreand provide a.vscode/settings.json.sampleas a template- Keep only team-agreed settings and document others in README
This is minor since the settings are Python-related and align with the project's testing infrastructure.
.github/workflows/test.yml (2)
31-33: Consider pinninguvversion for reproducibility.Using
version: "latest"may cause unexpected CI failures when a new uv version introduces breaking changes. Consider pinning to a specific version.- uses: astral-sh/setup-uv@v4 with: - version: "latest" + version: "0.5.10" # or current stable version
55-74: Add failure handling when server doesn't start.If the server fails to start within 30 seconds, the loop exits silently and integration tests run anyway (likely failing with confusing errors). Consider adding explicit failure detection.
The shellcheck warning about unused
iis a false positive for this retry-loop pattern.# Wait for server to be ready (up to ~30s) + SERVER_READY=false for i in {1..30}; do if curl -sSf http://localhost:8080/ >/dev/null; then echo "Test server is up" + SERVER_READY=true break fi echo "Waiting for test server on http://localhost:8080/ ..." sleep 1 done + if [ "$SERVER_READY" != "true" ]; then + echo "ERROR: Test server failed to start within 30 seconds" + kill "$SERVER_PID" 2>/dev/null || true + exit 1 + fi + # Run integration tests (will hit http://localhost:8080/…) pnpm test:integrationscripts/cors-proxy.js (1)
40-119: Consider adding request timeout.The
fetchcall has no timeout, which could cause the proxy to hang indefinitely if the target server doesn't respond. For a development tool, this may be acceptable, but consider adding a timeout for better developer experience.// Make the request - const response = await fetch(targetUrl, fetchOptions); + const controller = new AbortController(); + const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout + + try { + const response = await fetch(targetUrl, { ...fetchOptions, signal: controller.signal }); + clearTimeout(timeoutId);scripts/test-server.js (2)
78-143: Consider adding OPTIONS preflight handler for CORS.The server adds CORS headers to GET responses but doesn't handle OPTIONS preflight requests explicitly. While browsers may not send preflight for simple GET requests, adding OPTIONS support would ensure consistent behavior with the CORS proxy.
async function handleRequest(req, res) { try { + // Handle OPTIONS preflight + if (req.method === 'OPTIONS') { + res.writeHead(200, { + 'Access-Control-Allow-Origin': '*', + 'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS', + 'Access-Control-Allow-Headers': 'Range', + }); + res.end(); + return; + } + // Remove query string and normalize path
91-99: Path traversal check has potential edge case with symlinks.The check
!fullPath.startsWith(fixturesDir)validates the resolved path, which is good. However, iffixturesDircontains symlinks pointing outside the fixtures directory, they could bypass this check. For a test server this is low risk, but consider usingrealpathfor stricter validation if needed.tests/unit/schemas.test.ts (1)
197-302: Consider adding negative test cases for invalid data.The tests validate happy paths well, but could benefit from additional negative test cases, such as:
- Invalid
encoding-typevalues- Missing required fields in
tableAttrsSchema- Malformed coordinate systems
Example addition for tableAttrsSchema:
it('should reject missing required fields', () => { const attrs = { instance_key: 'cell_id', // missing region, region_key, spatialdata-encoding-type }; expect(() => tableAttrsSchema.parse(attrs)).toThrow(); });README.md (1)
184-186: Add language specifier to fenced code block.The fenced code block is missing a language specifier. Based on static analysis hint (MD040).
-``` +```text http://localhost:8081/https://example.com/data.zarr/.zattrs</blockquote></details> <details> <summary>package.json (1)</summary><blockquote> `14-26`: **Well-organized test scripts with good naming conventions.** The scripts provide a comprehensive testing workflow. Minor consideration: The `validate:all` script uses `bash scripts/validate-all.sh` which may not work on Windows without WSL or Git Bash. Consider using a cross-platform alternative or documenting this limitation. ```diff - "validate:all": "bash scripts/validate-all.sh", + "validate:all": "node scripts/validate-all.js",Alternatively, add a note in the README that this script requires a bash-compatible shell.
tests/integration/fixtures.test.ts (2)
16-35: Consider escaping version parameter in shell command.While
versioncomes from a controlled array, the use of string interpolation inexecSynccould be a concern if the test matrix is ever extended with external input.function ensureFixtures(version: string): string { const fixturePath = join(projectRoot, 'test-fixtures', `v${version}`, 'blobs.zarr'); if (!existsSync(fixturePath)) { console.log(`Fixtures not found for version ${version}, generating...`); try { - execSync(`uv run python/scripts/generate_fixtures.py --version ${version}`, { + execSync(`uv run python/scripts/generate_fixtures.py --version "${version}"`, { cwd: projectRoot, stdio: 'inherit', });Or use
spawnSyncwith an arguments array to avoid shell interpretation entirely.
49-132: Tests silently pass when server is not running.The tests return early with a
console.warnwhen the server isn't running, which means they'll appear to pass in CI even when not actually exercised. Consider usingit.skip()or throwing a skip exception to make this more visible.} catch (error) { if (error instanceof Error && error.message.includes('fetch')) { - console.warn( - 'Skipping integration test - test server not running. ' + - 'Start it with: pnpm test:server' - ); - return; + // Use Vitest's skip API to make skipped tests visible + throw new Error( + 'Test skipped: test server not running. Start it with: pnpm test:server' + ); } throw error; }Alternatively, use a
beforeAllhook to check server availability and skip the entire describe block if unavailable.packages/core/src/models/index.ts (1)
480-504: Error handling in loadElements prevents cascading failures.The try-catch around
createElement(lines 495-502) is a reasonable trade-off to prevent a single malformed element from breaking the entire load. The comment appropriately acknowledges the potential confusion of returning{}vsundefined.Consider tracking failed elements and exposing them for debugging:
const result: Record<string, ElementInstanceMap[T]> = {}; const errors: Record<string, Error> = {}; for (const key of keys) { try { result[key] = createElement(name, sdata, key); } catch (error) { errors[key] = error as Error; console.error(`Failed to load ${name}/${key}:`, error); } } // Optionally return { elements: result, errors } or attach to metadata return Object.keys(result).length > 0 ? result : undefined;This would give consumers visibility into partial load failures.
scripts/validate-all.sh (1)
1-14: Script setup looks good, but consider adding dependency checks.The script uses external tools (
uv,pnpm,jq,node) without verifying they're available. Consider adding dependency checks at the start to provide clearer error messages.set -e # Exit on error +# Check required dependencies +for cmd in uv pnpm jq node; do + if ! command -v "$cmd" &> /dev/null; then + echo -e "${RED}Error: Required command '$cmd' not found${NC}" + exit 1 + fi +done + # Colors for outputpython/v0.6.1/generate_fixtures.py (3)
27-29: Remove extraneous f-string prefix.Line 29 has an f-string without any placeholders. As flagged by Ruff (F541).
if actual_version != version: print(f"⚠️ Warning: Expected version {version} but got {actual_version}") - print(f" This may indicate the wrong environment is active.") + print(" This may indicate the wrong environment is active.")
42-46: Move import to top of file.The
shutilimport inside the function is unconventional. Consider moving it to the top-level imports for better readability and consistency with Python conventions.import sys from pathlib import Path +import shutil # Add scripts directory to path if we need shared utilitiesThen remove line 43:
- import shutil
85-90: Rename unused loop variable.The variable
elemis not used within the loop body. As flagged by Ruff (B007).elif isinstance(elements, list): print(f" - {element_type}: {len(elements)} element(s)") - for i, elem in enumerate(elements): + for i, _elem in enumerate(elements): print(f" * element_{i}")python/v0.5.0/generate_fixtures.py (6)
27-29: Remove extraneous f-string prefix.Line 29 has an f-string without any placeholders. As flagged by Ruff (F541).
if actual_version != version: print(f"⚠️ Warning: Expected version {version} but got {actual_version}") - print(f" This may indicate the wrong environment is active.") + print(" This may indicate the wrong environment is active.")
50-54: Avoid bareexceptclause.The bare
except:will catch all exceptions includingKeyboardInterruptandSystemExit, which is generally undesirable. As flagged by Ruff (E722).try: delattr(sdata, "points") - except: + except Exception: # If that fails, set to empty dict sdata.points = {}
72-79: Avoid bareexceptwith pass and consider logging.The
try-except-passpattern silently swallows errors. While this is defensive cleanup code, logging the exception would help with debugging. As flagged by Ruff (E722, S110).# Double-check it's gone if os.path.exists(store_path): print(f"Warning: Store path still exists after removal attempt: {store_path}") # Force remove try: shutil.rmtree(store_path, ignore_errors=True) - except: - pass + except Exception as e: + print(f"Warning: Force removal also failed: {e}")
93-102: Consider catching more specific exceptions for cleanup.The broad
Exceptioncatch and subsequent bareexcept:make debugging harder. Consider at minimum logging the cleanup failure.except Exception as e: # Clean up any partially written store if os.path.exists(store_path): try: if os.path.isdir(store_path): shutil.rmtree(store_path) else: os.remove(store_path) - except: - pass + except OSError as cleanup_error: + print(f"Warning: Cleanup failed: {cleanup_error}")
131-134: Rename unused loop variable.The variable
elemis not used within the loop body. As flagged by Ruff (B007).elif isinstance(elements, list): print(f" - {element_type}: {len(elements)} element(s)") - for i, elem in enumerate(elements): + for i, _elem in enumerate(elements): print(f" * element_{i}")
20-148: Significant code duplication with v0.6.1 generator.The
generate_fixturesfunction shares substantial logic withpython/v0.6.1/generate_fixtures.py. The main differences are the version string and the points workaround. Consider extracting shared utilities to reduce maintenance burden.A shared module in
python/scripts/could contain:
- The element enumeration/printing logic (lines 121-144)
- The coordinate systems printing logic (lines 138-144)
- The common write fallback pattern
This would make version-specific scripts smaller and easier to maintain.
scripts/validate-datasets-js.js (1)
223-262: Consider using a CLI parsing library.The manual argument parsing works but is verbose and error-prone. For a script of this complexity, consider using a library like
commanderoryargs, or at minimum, add validation for unknown arguments.Currently unknown arguments are silently ignored. Consider:
} else if (!arg.startsWith('--')) { // Positional arguments - could be intentional or typo } else { console.error(`Unknown option: ${arg}`); process.exit(1); }python/scripts/validate_datasets.py (5)
172-181: Remove unused exception variable.The exception variable
eis captured but never used. As per static analysis (Ruff F841).- except (json.JSONDecodeError, IndexError) as e: + except (json.JSONDecodeError, IndexError):
257-281: Remove unnecessary f-string prefixes.Several strings have
fprefix but contain no placeholders. As per static analysis (Ruff F541).if result.success: - lines.append(f"**Status:** ✅ Success") + lines.append("**Status:** ✅ Success") lines.append("")else: - lines.append(f"**Status:** ❌ Failed") + lines.append("**Status:** ❌ Failed") lines.append("") lines.append(f"**Error Type:** `{result.error_type}`") lines.append("") - lines.append(f"**Error Message:**") + lines.append("**Error Message:**")
219-225: Consider extracting version constants to avoid duplication.The version strings
"0.5.0"and"0.6.1"are hardcoded in multiple places (here, line 249, line 342, line 399). Extracting to a module-level constant would improve maintainability when adding new versions.# Near top of file SUPPORTED_VERSIONS = ["0.5.0", "0.6.1"]
401-413: Consider verifying environment directory exists before sync.If the environment directory
python/v{version}doesn't exist,uv syncwill fail with a potentially confusing error. Adding a pre-check would provide clearer feedback.# Ensure environments are set up for version in versions: env_dir = project_root / "python" / f"v{version}" + if not env_dir.exists(): + print(f"Error: Environment directory not found: {env_dir}", file=sys.stderr) + print("Please ensure the version-specific directories are set up.", file=sys.stderr) + sys.exit(1) print(f"Setting up environment for spatialdata {version}...", file=sys.stderr)
474-477: Unreachable else branch.The
argparsechoicesconstraint ensuresoutput_formatis always one ofmarkdown,csv, orjson. Theelsebranch is unreachable dead code.elif args.output_format == "json": output = json.dumps([r.to_dict() for r in results], indent=2) - else: - output = ""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
.cursor/settings.json(1 hunks).github/workflows/test.yml(1 hunks).gitignore(1 hunks).vscode/settings.json(1 hunks)README.md(2 hunks)package.json(1 hunks)packages/core/src/models/index.ts(6 hunks)packages/core/src/schemas/index.ts(3 hunks)packages/core/src/store/index.ts(3 hunks)packages/core/src/types.ts(2 hunks)packages/zarrextra/src/index.ts(2 hunks)packages/zarrextra/src/types.ts(1 hunks)python/.python-version(1 hunks)python/scripts/generate_fixtures.py(1 hunks)python/scripts/validate_datasets.py(1 hunks)python/v0.5.0/generate_fixtures.py(1 hunks)python/v0.5.0/pyproject.toml(1 hunks)python/v0.6.1/generate_fixtures.py(1 hunks)python/v0.6.1/pyproject.toml(1 hunks)scripts/cors-proxy.js(1 hunks)scripts/test-server.js(1 hunks)scripts/validate-all.sh(1 hunks)scripts/validate-datasets-js.js(1 hunks)tests/integration/fixtures.test.ts(1 hunks)tests/unit/schemas.test.ts(1 hunks)vitest.config.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (12)
scripts/test-server.js (1)
scripts/cors-proxy.js (2)
PORT(15-15)server(167-167)
tests/integration/fixtures.test.ts (3)
scripts/test-server.js (3)
__filename(15-15)__dirname(16-16)projectRoot(17-17)scripts/validate-datasets-js.js (1)
sdata(79-79)packages/core/src/store/index.ts (1)
readZarr(122-132)
vitest.config.ts (1)
scripts/test-server.js (1)
__dirname(16-16)
tests/unit/schemas.test.ts (1)
packages/core/src/schemas/index.ts (6)
rasterAttrsSchema(322-345)coordinateTransformationSchema(107-107)shapesAttrsSchema(370-377)pointsAttrsSchema(385-392)tableAttrsSchema(399-406)spatialDataSchema(413-416)
packages/core/src/store/index.ts (3)
packages/zarrextra/src/index.ts (3)
ConsolidatedStore(384-384)serializeZarrTree(353-381)openExtraConsolidated(297-320)packages/zarrextra/src/types.ts (1)
ConsolidatedStore(144-147)packages/core/src/types.ts (2)
ElementNames(28-28)ElementName(29-29)
packages/core/src/models/index.ts (2)
packages/core/src/types.ts (1)
ZarrTree(76-76)packages/zarrextra/src/types.ts (1)
ZarrTree(40-43)
scripts/validate-datasets-js.js (2)
packages/core/src/store/index.ts (1)
readZarr(122-132)python/scripts/validate_datasets.py (1)
main(335-486)
python/v0.5.0/generate_fixtures.py (2)
python/v0.6.1/generate_fixtures.py (2)
generate_fixtures(20-102)main(105-125)python/scripts/generate_fixtures.py (1)
main(15-96)
python/scripts/generate_fixtures.py (2)
python/v0.5.0/generate_fixtures.py (1)
main(151-171)python/v0.6.1/generate_fixtures.py (1)
main(105-125)
packages/core/src/types.ts (2)
packages/zarrextra/src/index.ts (1)
ConsolidatedStore(384-384)packages/zarrextra/src/types.ts (1)
ConsolidatedStore(144-147)
python/v0.6.1/generate_fixtures.py (1)
python/v0.5.0/generate_fixtures.py (2)
generate_fixtures(20-148)main(151-171)
packages/zarrextra/src/index.ts (3)
packages/zarrextra/src/types.ts (10)
IntermediateConsolidatedStore(133-135)ZarrTree(40-43)ZarrV3GroupNode(86-95)ZarrV3ArrayNode(56-81)ZAttrsAny(11-11)ATTRS_KEY(16-16)ZARRAY_KEY(21-21)ZarrV3Metadata(103-112)ZarrV2Metadata(49-51)ConsolidatedStore(144-147)packages/core/src/types.ts (7)
ZarrTree(76-76)ZAttrsAny(76-76)ATTRS_KEY(77-77)ZARRAY_KEY(77-77)Result(81-81)Err(82-82)Ok(82-82)packages/zarrextra/src/result.ts (3)
Result(17-19)Err(25-25)Ok(22-22)
🪛 actionlint (1.7.9)
.github/workflows/test.yml
55-55: shellcheck reported issue in this script: SC2034:warning:6:1: i appears unused. Verify use (or export if used externally)
(shellcheck)
🪛 markdownlint-cli2 (0.18.1)
README.md
184-184: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.8)
python/scripts/validate_datasets.py
151-151: subprocess call: check for execution of untrusted input
(S603)
152-152: Starting a process with a partial executable path
(S607)
172-172: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
192-192: Do not catch blind exception: Exception
(BLE001)
258-258: f-string without any placeholders
Remove extraneous f prefix
(F541)
274-274: f-string without any placeholders
Remove extraneous f prefix
(F541)
278-278: f-string without any placeholders
Remove extraneous f prefix
(F541)
405-405: subprocess call: check for execution of untrusted input
(S603)
406-406: Starting a process with a partial executable path
(S607)
python/v0.5.0/generate_fixtures.py
29-29: f-string without any placeholders
Remove extraneous f prefix
(F541)
52-52: Do not use bare except
(E722)
69-69: Do not catch blind exception: Exception
(BLE001)
78-78: Do not use bare except
(E722)
78-79: try-except-pass detected, consider logging the exception
(S110)
101-101: Do not use bare except
(E722)
101-102: try-except-pass detected, consider logging the exception
(S110)
112-114: Avoid specifying long messages outside the exception class
(TRY003)
133-133: Loop control variable elem not used within loop body
Rename unused elem to _elem
(B007)
python/scripts/generate_fixtures.py
57-57: subprocess call: check for execution of untrusted input
(S603)
58-58: Starting a process with a partial executable path
(S607)
74-74: subprocess call: check for execution of untrusted input
(S603)
75-80: Starting a process with a partial executable path
(S607)
python/v0.6.1/generate_fixtures.py
29-29: f-string without any placeholders
Remove extraneous f prefix
(F541)
87-87: Loop control variable elem not used within loop body
Rename unused elem to _elem
(B007)
🔇 Additional comments (39)
.gitignore (3)
12-13: LGTM: Test artifact directories appropriately ignored.The additions of
test-fixtures/andvalidation-results/align with the test infrastructure introduced in this PR and keep generated artifacts out of version control.
15-19: LGTM: Version-specific Python environments properly isolated.The ignore rules for
python/v0.5.0/.venv/,python/v0.6.1/.venv/, and their respectiveuv.lockfiles are appropriate for isolating test environments across multiple SpatialData versions without polluting the repository.
26-28: Clarify: Comment mentions.cursor/settings.jsonbut no corresponding ignore rule.Line 26 notes that
.cursor/settings.jsonis committed for Python interpreter setup, but the.gitignoredoes not include any rules for the.cursor/directory (unlike the selective rules for.vscode/). This could lead to unintended commits of.cursor/files in the future.Please either:
- Add a note explaining that
.cursor/is not used or doesn't need selective ignoring, or- Add corresponding ignore rules for
.cursor/(e.g.,.cursor/launch.jsonand.cursor/tasks.json), similar to the.vscode/entries.python/.python-version (1)
1-2: Python version pin matches project configsPinning Python to 3.12 here is consistent with the
requires-python = ">=3.12"settings in the fixture pyprojects; just ensure CI/dev environments also run 3.12 so behavior matches local testing.python/v0.5.0/pyproject.toml (1)
1-8: Checkspatialdata==0.5.0support for Python 3.12
requires-python = ">=3.12"plusspatialdata==0.5.0is fine if that release declares/supports 3.12; otherwise environment resolution could fail or behave oddly. Please double‑check spatialdata 0.5.0’s supported Python versions and adjustrequires-pythonor the pinned version if needed..cursor/settings.json (1)
1-14: Cursor Python configuration looks consistent with new fixture envsPointing
python.defaultInterpreterPathatpython/v0.6.1/.venv/bin/python3and documenting the 0.5.0 alternative in comments is a reasonable default; this should work well as long as contributors switch the path if they primarily work against 0.5.0 fixtures.python/v0.6.1/pyproject.toml (1)
1-8: Verifyspatialdata==0.6.1on Python 3.12This env also requires Python ≥3.12 with
spatialdata==0.6.1. Please confirm that 0.6.1 officially supports 3.12 so fixture generation doesn’t hit installation/runtime issues; if not, you may need to relaxrequires-pythonor choose a different Python version for this env.packages/core/src/types.ts (2)
17-17: LGTM!Import of
ConsolidatedStoreis appropriate for the updated type definition.
88-93: Breaking change toSDataPropsinterface.The
rootStoretype change toConsolidatedStoreis a breaking change that affects consumers of this interface. All internal codebase usages have been updated and are compatible with the new type—both the.treeand.zarritaStoreproperties are properly accessed throughout the codebase. This aligns with the PR's goal to support zarr v3.scripts/cors-proxy.js (1)
1-184: LGTM overall with security caveat noted.The CORS proxy implementation is well-structured for local development. The warning about being development-only is appropriate. The SSRF potential is acknowledged by design since this is meant to proxy arbitrary URLs.
Key positives:
- Proper CORS header handling
- URL validation requiring http(s)://
- Graceful shutdown handling
- Good error handling with CORS headers preserved on errors
tests/unit/schemas.test.ts (3)
1-10: LGTM! Well-structured test imports and organization.The imports are clean and cover all the necessary schemas being tested. Good separation of concerns with the test file structure.
11-100: Excellent coverage of version normalization logic.The rasterAttrsSchema tests comprehensively cover:
- v0.6.1 format normalization (nested
omekey extraction)- v0.5.0 format passthrough
- Preservation of omero data from v0.6.1
This directly validates the core PR objective of supporting both zarr v2 and v3 stores.
102-195: Good transformation schema test coverage.Tests cover all transformation types: scale, translation, affine, identity, sequence, and coordinate system references. The empty array rejection test (lines 190-194) validates the
.min(1)constraint from the schema definition.README.md (2)
21-22: LGTM! Clear prerequisites documentation.Good addition of Python and uv requirements for test fixture generation.
53-308: Comprehensive testing documentation.Excellent documentation covering the complete testing workflow including:
- Environment setup for multiple SpatialData versions
- Fixture generation commands
- Test server and CORS proxy usage
- Dataset validation with multiple output formats
The security warning for the CORS proxy (line 202) is appropriately highlighted with
⚠️ .python/scripts/generate_fixtures.py (4)
1-12: LGTM! Clear module documentation.Good docstring explaining the script's purpose and its relationship to version-specific scripts.
15-40: LGTM! Clean argument parsing and path resolution.The argument parsing with choices validation and sensible defaults is well-implemented. Path resolution from script location is correct.
42-86: Good error handling with continuation on failure.The script appropriately continues to the next version when one fails, collecting all errors before reporting. The static analysis warnings (S603/S607) about subprocess security are false positives here since:
- The
uvcommand is a hardcoded tool name- The version is validated via argparse
choices- Paths are constructed from controlled values, not user input
88-100: LGTM! Clear success/failure reporting.Good use of exit codes and visual feedback for CI/CD integration.
tests/integration/fixtures.test.ts (1)
1-11: LGTM! Standard ESM path resolution.Correct usage of
fileURLToPathanddirnamefor ESM module path resolution.packages/core/src/models/index.ts (4)
57-70: LGTM! Clean migration to tree-based store access.The switch from parsed store to
rootStore.treeis well-implemented with appropriate null checks and error messages.
206-211: Stricter validation is appropriate but is a breaking change.Switching from
console.warn+ fallback to throwing on validation failure is the right approach for data integrity. However, this is a breaking change for consumers who previously relied on the fallback behavior.Ensure this breaking change is documented in release notes. Consumers who handle malformed attrs may need to update their error handling.
336-344: LGTM! Consistent validation and store access pattern.ShapesElement follows the same stricter validation pattern and correctly uses
zarritaStorefrom the root store for geometry loading.
392-398: LGTM! Consistent with other spatial elements.PointsElement follows the established pattern for validation and store access.
scripts/validate-all.sh (2)
37-47: Python validation step is well-structured.The script correctly captures both stdout and stderr to a file while still allowing the script to fail on error. The dual-pass approach (JSON then Markdown) is a reasonable pattern for generating multiple output formats.
112-115: Relative symlink creation is correct.Using a relative path (
$TIMESTAMP) for the symlink target ensures portability. Therm -fbeforeln -shandles existing symlinks properly.python/v0.6.1/generate_fixtures.py (1)
52-71: Exception handling with multiple fallbacks is reasonable for cross-version compatibility.The cascading fallback approach (
write→writewithout overwrite →write_zarr→sd.io.write_zarr) handles API variations across spatialdata versions well. The nested try-except structure, while verbose, ensures maximum compatibility.scripts/validate-datasets-js.js (3)
9-11: Import from dist directory requires build step.The import from
../packages/core/dist/index.jsrequires the package to be built first. This is handled byvalidate-all.shbut should be documented in the script's help text or comments.
60-114: Dataset validation function is well-implemented.Good error handling with structured result objects, stack trace capture (limited to 5 lines), and flexible proxy support. The element extraction logic handles the SpatialData structure correctly.
358-361: Top-level error handling is appropriate.The
.catch()handler ensures any unhandled promise rejections result in a non-zero exit code, which is important for CI integration.packages/core/src/schemas/index.ts (4)
257-283: V0.5.0 schema correctly captures top-level multiscales structure.The schema appropriately uses
.passthrough()to preserve unvalidated fields, which is important for forward compatibility.
285-315: V0.6.1 schema correctly captures nested OME structure.The nested
omeobject with its own.passthrough()allows for extension fields within the ome namespace while still validating the core structure.
347-364: Explicit RasterAttrs type provides clear contract for consumers.The explicit type definition with
[key: string]: unknownindex signature correctly represents the normalized output structure and allows for additional passthrough fields.
322-345: Transform logic correctly normalizes both schema formats.The transform works as intended:
- Detects v0.6.1 format by checking for 'ome' key
- Extracts multiscales/omero from nested structure
- Preserves spatialdata_attrs and other top-level fields
- Falls through for v0.5.0 format
Type assertions (
as RasterAttrs,as RasterAttrs['multiscales']) are necessary given the transform's complexity and are already covered by comprehensive unit tests intests/unit/schemas.test.tsthat validate both v0.5.0 and v0.6.1 formats, including omero data preservation.python/scripts/validate_datasets.py (5)
1-19: LGTM!Imports are appropriate and well-organized for the script's functionality.
21-36: LGTM!Clean dataclass definition with appropriate use of
Optionalfor nullable fields.
290-326: LGTM!Clean CSV generation implementation. Local imports are acceptable here since this output format may not always be used.
329-332: LGTM!Correct wrapper pattern for multiprocessing pool.
455-463: LGTM!Multiprocessing with
imapprovides ordered results while allowing progress reporting. Appropriate for a validation script.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
scripts/validate-datasets-js.js (2)
60-114: Normalize thrown errors and stabilize the result schema for easier downstream consumption.To make JSON/CSV consumers simpler and avoid surprises if a non-
Errorvalue is ever thrown, you can normalizeerrorand always initializestackTraceon the result:async function validateDataset(dataset, useProxy = false) { const result = { datasetName: dataset.name, datasetUrl: dataset.url, implementation: 'JavaScript (@spatialdata/core)', success: false, errorType: null, errorMessage: null, elements: null, - coordinateSystems: null, + coordinateSystems: null, + stackTrace: null, }; try { @@ - } catch (error) { - result.success = false; - result.errorType = error.constructor.name; - result.errorMessage = error.message; - - // Include stack trace for debugging - if (error.stack) { - result.stackTrace = error.stack.split('\n').slice(0, 5).join('\n'); - } - } + } catch (error) { + const err = error instanceof Error ? error : new Error(String(error)); + result.success = false; + result.errorType = err.name; + result.errorMessage = err.message; + + if (err.stack) { + result.stackTrace = err.stack.split('\n').slice(0, 5).join('\n'); + } + }This keeps the schema consistent and avoids undefined accesses if something non-standard is thrown.
169-215: Consider making coordinate system names more readable in markdown output.If
result.coordinateSystemscontains objects rather than plain strings,join(', ')will render them as[object Object]. You could improve readability like:- if (result.coordinateSystems && result.coordinateSystems.length > 0) { - lines.push(`**Coordinate Systems:** ${result.coordinateSystems.join(', ')}`); + if (result.coordinateSystems && result.coordinateSystems.length > 0) { + const csLabels = result.coordinateSystems.map(cs => + typeof cs === 'string' + ? cs + : (cs && cs.name) || JSON.stringify(cs) + ); + lines.push(`**Coordinate Systems:** ${csLabels.join(', ')}`);If
coordinateSystemsare already strings, this is a no-op; otherwise it produces more meaningful labels.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/validate-datasets-js.js(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-15T18:47:01.953Z
Learnt from: xinaesthete
Repo: Taylor-CCB-Group/SpatialData.js PR: 15
File: scripts/validate-all.sh:52-66
Timestamp: 2025-12-15T18:47:01.953Z
Learning: In the SpatialData.js monorepo, the validation scripts (scripts/validate-all.sh and scripts/validate-datasets-js.js) only depend on spatialdata/core and its dependency spatialdata/zarrextra, not on the react or vis packages. Therefore, when optimizing builds, it is sufficient to check only packages/core/dist before deciding to skip a build. Ensure this dependency boundary is enforced in review checks and that any changes to react/vis do not alter the validation flow.
Applied to files:
scripts/validate-datasets-js.js
📚 Learning: 2025-12-15T18:52:01.166Z
Learnt from: xinaesthete
Repo: Taylor-CCB-Group/SpatialData.js PR: 15
File: scripts/validate-datasets-js.js:14-55
Timestamp: 2025-12-15T18:52:01.166Z
Learning: In the SpatialData official documentation's "Spatial omics datasets" page, there is a discrepancy between the "Filename" column and the actual working remote URLs in the "work with data remotely" column. When working with spatialdata datasets from S3, use the URLs from the "work with data remotely" column, not the filenames. Examples of differences: visium_hd_3.0.0_io.zarr/ vs visium_hd_3.0.0_id.zarr/ (filename vs remote URL), mouse_liver.zarr vs MouseLiver.zarr/, spacem_helanih3t3.zarr vs spacem_hepanih3t3.zarr/.
Applied to files:
scripts/validate-datasets-js.js
🔇 Additional comments (3)
scripts/validate-datasets-js.js (3)
9-55: Imports and dataset URL definitions look correct and respect the core-only dependency boundary.Importing
readZarrfrom../packages/core/dist/index.jskeeps this script depending only on the core package, and the S3 URLs mirror the “work with data remotely” column from the SpatialData docs (even where those differ from the “Filename” column), which matches the intended behavior of the validator.Based on learnings, this aligns with the repo’s validation and dependency constraints.
329-343: CSV quoting for JSON-heavy and error fields now looks correct.Escaping double quotes in
elements,coordinateSystems, anderrorMessagebefore wrapping them in quotes prevents malformed CSV, and given the controlled content of the other columns this is sufficient for the current usage.
223-312: CLI parsing and main validation loop look solid for the current dataset set size.Argument handling, dataset filtering (by substring), proxy warning, and the sequential validation loop all read cleanly and should behave as expected for ~10 remote datasets. The stderr/stdout split for logs vs. report output is also appropriate for CI usage.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@packages/core/src/models/VTableSource.ts`:
- Around line 3-5: The file VTableSource.ts contains a triple-slash reference to
"vite/client" which introduces a compile-time dependency on Vite types; remove
the line '/// <reference types="vite/client" />' from the source file and
instead add a new ambient declaration file (e.g., vite-env.d.ts) at the package
root that declares the needed ImportMetaEnv/ImportMeta typings or references
"vite/client" so Vite-specific types are isolated from core source files; ensure
the new file is included by tsconfig (or package types) so consumers building
without Vite won't fail typechecks.
In `@packages/core/src/schemas/index.ts`:
- Around line 332-356: The transform in rasterAttrsSchema currently treats any
object with an ome key as OME-NGFF 0.5 which can misclassify v0.4 objects;
update the guard inside the transform (the branch that checks 'ome' in data) to
also verify that data.ome.multiscales exists and is an array before treating the
payload as OME 0.5—i.e., in the transform of rasterAttrsSchema (which unions
rasterAttrs_OME_04_Schema and rasterAttrs_OME_05_Schema) require that
ome.multiscales is an Array (and non-empty if you prefer stricter checking)
before extracting omeData.multiscales and omeData.omero and returning the
reconstructed RasterAttrs; otherwise fall back to returning data as RasterAttrs.
In `@python/v0.6.1/generate_fixtures.py`:
- Around line 27-29: Remove the unnecessary f-string prefix on the second print
that has no interpolation: in the conditional that compares actual_version and
version (the block using actual_version, version and the two print() calls),
change the second print(f" This may indicate the wrong environment is
active.") to a regular string literal so the f-prefix is removed; ensure you
only alter that print so static analyzer F541 is resolved.
- Around line 81-86: Rename the unused loop variable in the enumerate loop to
signal intentional non-use: change the loop header "for i, elem in
enumerate(elements):" to use "_elem" instead of "elem" (i.e., "for i, _elem in
enumerate(elements):") so static analysis (B007) no longer flags the unused
variable; leave the rest of the loop (the print using i) unchanged.
In `@tests/integration/fixtures.test.ts`:
- Around line 55-66: The current catch block around the FetchStore test only
checks error.message.includes('fetch') so ERR_UNSUPPORTED_ESM_URL_SCHEME and
other network/startup errors slip through; update the catch in the try/catch
that wraps the FetchStore/network call (the catch handling the variable error)
to detect a broader set of failures by checking error instanceof Error and
matching message for additional tokens such as 'ERR_UNSUPPORTED_ESM_URL_SCHEME',
'ECONNREFUSED', 'connect', or other network/fetch-related substrings (or falling
back to a pragmatic heuristic like treating non-application errors as skip
candidates), and apply the same widened detection to the other similar catch
blocks referenced in the comment so the tests gracefully skip when the test
server or environment isn’t available.
🧹 Nitpick comments (10)
scripts/validate-datasets-js.js (1)
10-10: Redundant dynamic import offsmodule.
writeFileSyncis already imported fromnode:fsat line 10, but there's a dynamic import offsat line 322 to usereadFileSync. Consider using a static import forreadFileSyncalongsidewriteFileSyncfor consistency.♻️ Suggested refactor
-import { writeFileSync } from 'node:fs'; +import { writeFileSync, readFileSync } from 'node:fs';Then at line 322-324:
- const fs = await import('node:fs'); - const data = fs.readFileSync(args.comparePython, 'utf-8'); + const data = readFileSync(args.comparePython, 'utf-8');Also applies to: 321-324
README.md (1)
193-198: Add language specifier to fenced code block.The fenced code block at line 195 is missing a language specifier, which triggers a markdownlint warning. Since this is a URL example, use
textor an empty specifier.📝 Suggested fix
**Usage:** Proxy a remote URL by encoding it directly in the path: -``` +```text http://localhost:8081/https://example.com/data.zarr/.zattrs</details> </blockquote></details> <details> <summary>python/scripts/validate_datasets.py (2)</summary><blockquote> `172-172`: **Remove unused variable `e`.** The exception variable `e` is assigned but never used. Use `_` to indicate it's intentionally unused, or remove it entirely. <details> <summary>♻️ Suggested fix</summary> ```diff - except (json.JSONDecodeError, IndexError) as e: + except (json.JSONDecodeError, IndexError):
259-280: Remove extraneousfprefix from strings without placeholders.Lines 260, 276, and 280 use f-strings but contain no placeholders. Remove the
fprefix.♻️ Suggested fix
- lines.append(f"**Status:** ✅ Success") + lines.append("**Status:** ✅ Success")- lines.append(f"**Status:** ❌ Failed") + lines.append("**Status:** ❌ Failed")- lines.append(f"**Error Message:**") + lines.append("**Error Message:**")python/v0.7.0/generate_fixtures.py (1)
81-84: Rename unused loop variable.The loop variable
elemis not used within the loop body. Rename it to_elemto indicate it's intentionally unused.♻️ Suggested fix
elif isinstance(elements, list): print(f" - {element_type}: {len(elements)} element(s)") - for i, elem in enumerate(elements): + for i, _elem in enumerate(elements): print(f" * element_{i}")scripts/validate-all.sh (3)
37-50: Consider addingset -o pipefailfor safer pipeline handling.With
set -e, if theuv runcommand fails but the output redirection succeeds, the script will still exit. However, if you ever change this to use pipelines (e.g.,command | tee file), errors in the first command could be masked. Addingset -o pipefailnear line 6 would make the script more robust.💡 Suggested improvement
-set -e # Exit on error +set -eo pipefail # Exit on error and fail on pipeline errors
96-99: Missing dependency check forjq.The script assumes
jqis available but doesn't verify this. Ifjqis missing, the script will fail with an unhelpful error during the summary phase after already completing the time-consuming validation steps.🛡️ Proposed fix to add dependency check
Add this near the top of the script (after the color definitions):
# Check required dependencies for cmd in jq uv node pnpm; do if ! command -v "$cmd" &> /dev/null; then echo -e "${RED}Error: Required command '$cmd' not found.${NC}" exit 1 fi done
112-116: Symlink creation may fail silently on some systems.The
ln -scommand creates a relative symlink, which is good. However, on some systems or edge cases (e.g., if$OUTPUT_DIRdoesn't exist or has permission issues), this could fail. Consider adding error handling.💡 Minor improvement for robustness
# Create a symlink to latest results rm -f "$OUTPUT_DIR/latest" -ln -s "$TIMESTAMP" "$OUTPUT_DIR/latest" -echo -e "${GREEN}✓ Symlink created: $OUTPUT_DIR/latest${NC}" +if ln -s "$TIMESTAMP" "$OUTPUT_DIR/latest" 2>/dev/null; then + echo -e "${GREEN}✓ Symlink created: $OUTPUT_DIR/latest${NC}" +else + echo -e "${YELLOW}⚠ Could not create symlink (may not be supported on this filesystem)${NC}" +fipython/v0.6.1/generate_fixtures.py (1)
42-46: Consider movingshutilimport to module level.The
import shutilinside the function works but is unconventional. Module-level imports improve readability and make dependencies explicit at the top of the file.💡 Suggested change
Move the import to the top of the file with other imports:
import sys from pathlib import Path +import shutilThen remove line 43:
- # Remove existing store if it exists (to allow regenerating fixtures) - import shutil + # Remove existing store if it exists (to allow regenerating fixtures)packages/core/src/models/VTableSource.ts (1)
27-36: Add/*@vite-ignore*/directive for external URL dynamic import.Vite requires the
/*@vite-ignore*/comment to skip import analysis on external URLs; otherwise it will attempt to prebundle/resolve and may fail or generate warnings. ThewebpackIgnore: truecomment alone is insufficient—it only affects webpack, not Vite. Add the Vite directive alongside the existing webpack comment:Suggested fix
- const module = await import(/* webpackIgnore: true */ 'https://cdn.vitessce.io/parquet-wasm@2c23652/esm/parquet_wasm.js'); + const module = await import( + /* `@vite-ignore` */ /* webpackIgnore: true */ + 'https://cdn.vitessce.io/parquet-wasm@2c23652/esm/parquet_wasm.js' + );
| export const rasterAttrsSchema = z | ||
| .union([rasterAttrs_OME_04_Schema, rasterAttrs_OME_05_Schema]) | ||
| .transform((data): RasterAttrs => { | ||
| // Format detection is structural: if 'ome' key exists, it's OME-NGFF 0.5 format | ||
| // This is independent of spatialdata_attrs.version (which is library version metadata) | ||
| if ('ome' in data && data.ome && typeof data.ome === 'object') { | ||
| const omeData = data.ome as { | ||
| multiscales: unknown; | ||
| omero?: unknown; | ||
| [key: string]: unknown; | ||
| }; | ||
| return { | ||
| multiscales: omeData.multiscales as RasterAttrs['multiscales'], | ||
| omero: omeData.omero as RasterAttrs['omero'], | ||
| spatialdata_attrs: 'spatialdata_attrs' in data ? data.spatialdata_attrs : undefined, | ||
| // Preserve any other top-level fields | ||
| ...Object.fromEntries( | ||
| Object.entries(data).filter(([key]) => key !== 'ome' && key !== 'spatialdata_attrs') | ||
| ), | ||
| } as RasterAttrs; | ||
| } | ||
|
|
||
| // Otherwise, it's already in the v0.5.0 format (top-level multiscales), return as-is | ||
| return data as RasterAttrs; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n packages/core/src/schemas/index.ts | head -400 | tail -100Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 4476
🏁 Script executed:
rg "rasterAttrs_OME_04_Schema|rasterAttrs_OME_05_Schema" packages/core/src/schemas/index.ts -B 5 -A 5Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 1226
🏁 Script executed:
rg "passthrough" packages/core/src/schemas/index.ts -B 3 -A 3Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 1293
🏁 Script executed:
rg "const rasterAttrs_OME_04_Schema|const rasterAttrs_OME_05_Schema" packages/core/src/schemas/index.ts -A 30Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 1752
🏁 Script executed:
rg "RasterAttrs\['multiscales'\]" packages/core/src/schemas/index.ts -B 5 -A 5Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 577
🏁 Script executed:
# Let's check if there are any tests that might demonstrate the issue
fd "test|spec" packages/core/src -type f | head -20Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 245
🏁 Script executed:
fd "test\|spec" packages/core/src --extension ts --extension js | head -20Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 57
🏁 Script executed:
# Find test files more broadly
find packages/core -name "*.test.ts" -o -name "*.spec.ts" | head -20Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 106
🏁 Script executed:
# Let's understand the actual schema definitions and structure better
# Check rasterAttrs_OME_04_Schema - can it have an 'ome' key due to passthrough?
rg "rasterAttrs_OME_04_Schema" packages/core/src/schemas/index.ts -B 2 -A 25 | head -50Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 1810
🏁 Script executed:
# Let me check if there are any usages or tests showing this scenario
rg "rasterAttrsSchema" packages/core/src --include="*.ts" | head -15Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 516
🏁 Script executed:
# Check the RasterAttrs type definition to understand expected structure
rg "type RasterAttrs" packages/core/src/schemas/index.ts -A 15Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 535
🏁 Script executed:
# Let me check how rasterAttrsSchema is used/tested
rg "rasterAttrsSchema" packages/core/src -A 3 -B 1Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 1073
🏁 Script executed:
# Check if there are any examples or test data in the codebase
find packages/core -name "*.json" -o -name "*.yml" -o -name "*.yaml" | head -20Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 166
🏁 Script executed:
# Let's check the OME_05_Schema more carefully, especially the multiscales validation
cat -n packages/core/src/schemas/index.ts | sed -n '275,330p'Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 2289
🏁 Script executed:
# Check the actual validation - does OME_05 schema enforce structure on multiscales?
rg "rasterAttrs_OME_05_Schema" packages/core/src/schemas/index.ts -A 40Repository: Taylor-CCB-Group/SpatialData.js
Length of output: 2977
Validate ome.multiscales structure before treating data as OME-NGFF 0.5 format.
The current check ('ome' in data && data.ome && typeof data.ome === 'object') can misclassify data if an extra ome field exists on a v0.5.0 attrs object. Since rasterAttrs_OME_04_Schema has .passthrough(), a v0.4 format object with an unrelated ome field (e.g., {multiscales: [...], ome: {foo: 123}}) would parse successfully, then the transform would incorrectly treat it as OME 0.5 and extract omeData.multiscales which would be undefined.
Strengthen the guard to check that ome.multiscales is actually an array before branching, ensuring format detection is based on the presence of the required nested structure, not just the ome key.
🛠️ Suggested guard for `ome` detection
- if ('ome' in data && data.ome && typeof data.ome === 'object') {
- const omeData = data.ome as {
+ const ome = (data as { ome?: { multiscales?: unknown } }).ome;
+ if (
+ ome &&
+ typeof ome === 'object' &&
+ Array.isArray((ome as { multiscales?: unknown }).multiscales)
+ ) {
+ const omeData = ome as {
multiscales: unknown;
omero?: unknown;
[key: string]: unknown;
};🤖 Prompt for AI Agents
In `@packages/core/src/schemas/index.ts` around lines 332 - 356, The transform in
rasterAttrsSchema currently treats any object with an ome key as OME-NGFF 0.5
which can misclassify v0.4 objects; update the guard inside the transform (the
branch that checks 'ome' in data) to also verify that data.ome.multiscales
exists and is an array before treating the payload as OME 0.5—i.e., in the
transform of rasterAttrsSchema (which unions rasterAttrs_OME_04_Schema and
rasterAttrs_OME_05_Schema) require that ome.multiscales is an Array (and
non-empty if you prefer stricter checking) before extracting omeData.multiscales
and omeData.omero and returning the reconstructed RasterAttrs; otherwise fall
back to returning data as RasterAttrs.
| } catch (error) { | ||
| // If FetchStore doesn't support file:// or server isn't running, | ||
| // we'll skip this test with a helpful message | ||
| if (error instanceof Error && error.message.includes('fetch')) { | ||
| console.warn( | ||
| 'Skipping integration test - test server not running. ' + | ||
| 'Start it with: pnpm test:server' | ||
| ); | ||
| return; | ||
| } | ||
| throw error; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error detection may miss ESM URL scheme errors.
The skip logic checks if the error message includes 'fetch', but the pipeline failure shows ERR_UNSUPPORTED_ESM_URL_SCHEME. This error won't trigger the skip path, causing the test to fail instead of gracefully skipping.
Consider broadening the error detection to handle network-related errors more comprehensively.
🔧 Suggested fix
} catch (error) {
// If FetchStore doesn't support file:// or server isn't running,
// we'll skip this test with a helpful message
- if (error instanceof Error && error.message.includes('fetch')) {
+ if (error instanceof Error && (
+ error.message.includes('fetch') ||
+ error.message.includes('ECONNREFUSED') ||
+ error.message.includes('ERR_UNSUPPORTED')
+ )) {
console.warn(
'Skipping integration test - test server not running. ' +
'Start it with: pnpm test:server'
);
return;
}
throw error;
}Apply similar changes to the other test blocks (lines 86-91, 106-111, 124-129).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (error) { | |
| // If FetchStore doesn't support file:// or server isn't running, | |
| // we'll skip this test with a helpful message | |
| if (error instanceof Error && error.message.includes('fetch')) { | |
| console.warn( | |
| 'Skipping integration test - test server not running. ' + | |
| 'Start it with: pnpm test:server' | |
| ); | |
| return; | |
| } | |
| throw error; | |
| } | |
| } catch (error) { | |
| // If FetchStore doesn't support file:// or server isn't running, | |
| // we'll skip this test with a helpful message | |
| if (error instanceof Error && ( | |
| error.message.includes('fetch') || | |
| error.message.includes('ECONNREFUSED') || | |
| error.message.includes('ERR_UNSUPPORTED') | |
| )) { | |
| console.warn( | |
| 'Skipping integration test - test server not running. ' + | |
| 'Start it with: pnpm test:server' | |
| ); | |
| return; | |
| } | |
| throw error; | |
| } |
🤖 Prompt for AI Agents
In `@tests/integration/fixtures.test.ts` around lines 55 - 66, The current catch
block around the FetchStore test only checks error.message.includes('fetch') so
ERR_UNSUPPORTED_ESM_URL_SCHEME and other network/startup errors slip through;
update the catch in the try/catch that wraps the FetchStore/network call (the
catch handling the variable error) to detect a broader set of failures by
checking error instanceof Error and matching message for additional tokens such
as 'ERR_UNSUPPORTED_ESM_URL_SCHEME', 'ECONNREFUSED', 'connect', or other
network/fetch-related substrings (or falling back to a pragmatic heuristic like
treating non-application errors as skip candidates), and apply the same widened
detection to the other similar catch blocks referenced in the comment so the
tests gracefully skip when the test server or environment isn’t available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@scripts/validate-datasets-js.js`:
- Around line 333-336: The fatal error handler currently logs and exits but
doesn't stop the proxy server if it was started; modify the catch block for
main() to check for the proxy server instance (e.g., proxyServer) and perform a
proper shutdown (call proxyServer.close() and await/handle its completion or
errors) before process.exit(1), or move proxyServer to module scope so the
handler can access and close it; ensure any errors from shutdown are logged and
do not leak the server.
🧹 Nitpick comments (5)
README.md (1)
196-199: Add language identifier to fenced code block.The code block at line 197 is missing a language specifier. Since this is showing a URL format/pattern, use
textorplaintextto satisfy markdown linting.📝 Suggested fix
**Usage:** Proxy a remote URL using query parameter: -``` +```text http://localhost:8081/?url=https://example.com/data.zarr/.zattrs</details> </blockquote></details> <details> <summary>scripts/cors-proxy.js (3)</summary><blockquote> `12-16`: **Variable shadowing: `resolve` from 'node:path' is shadowed in Promise callbacks.** The `resolve` import from `node:path` is shadowed by the Promise `resolve` parameter in `createProxyServer` (line 175). While this works because the import isn't used in that scope, it can cause confusion. Consider renaming the import. <details> <summary>♻️ Suggested fix</summary> ```diff import { createServer } from 'node:http'; import { URL, fileURLToPath } from 'node:url'; -import { resolve } from 'node:path'; +import { resolve as resolvePath } from 'node:path';Then update line 211:
-const isMainModule = process.argv[1] && resolve(process.argv[1]) === __filename; +const isMainModule = process.argv[1] && resolvePath(process.argv[1]) === __filename;
55-66: Consider including PATCH method in body forwarding.The body forwarding logic handles POST and PUT, but PATCH requests also commonly include a body. This could cause issues if any proxied APIs use PATCH with payloads.
♻️ Suggested fix
// Forward request body for POST/PUT let body = null; - if (req.method === 'POST' || req.method === 'PUT') { + if (req.method === 'POST' || req.method === 'PUT' || req.method === 'PATCH') { const chunks = []; for await (const chunk of req) { chunks.push(chunk);
191-206: Unbounded recursion risk in port retry logic.The
startProxyServerfunction recursively retries on the next port whenEADDRINUSEoccurs. In edge cases where many consecutive ports are occupied, this could lead to deep recursion or unexpected behavior. Consider adding a maximum retry limit.🛡️ Suggested fix with retry limit
-export async function startProxyServer(port = 8081) { +export async function startProxyServer(port = 8081, maxRetries = 10) { + if (maxRetries <= 0) { + throw new Error(`Could not find an available port after multiple retries (starting from ${port - 10})`); + } try { const server = await createProxyServer(port); // Get the actual port (in case it was changed due to port conflict) const actualPort = server.address()?.port || port; const baseUrl = `http://localhost:${actualPort}`; return { server, baseUrl }; } catch (error) { // If port is in use, try next port if (error.code === 'EADDRINUSE') { console.error(`Port ${port} is in use, trying ${port + 1}...`); - return startProxyServer(port + 1); + return startProxyServer(port + 1, maxRetries - 1); } throw error; } }scripts/validate-datasets-js.js (1)
298-307: CSV escaping is incomplete for some fields.While
elements,coordinateSystems, anderrorMessageare now properly escaped, thedatasetName,datasetUrl, andimplementationfields are not. If any dataset name or URL contains double quotes (unlikely but possible), the CSV will be malformed.♻️ Suggested fix for complete CSV escaping
} else if (args.outputFormat === 'csv') { // Simple CSV generation + const csvEscape = (str) => str ? String(str).replace(/"/g, '""') : ''; const lines = ['Dataset Name,Dataset URL,Implementation,Success,Error Type,Error Message,Elements,Coordinate Systems']; for (const r of results) { const elements = r.elements ? JSON.stringify(r.elements).replace(/"/g, '""') : ''; const cs = r.coordinateSystems ? JSON.stringify(r.coordinateSystems).replace(/"/g, '""') : ''; - const errorMsg = (r.errorMessage || '').replace(/"/g, '""'); - lines.push(`"${r.datasetName}","${r.datasetUrl}","${r.implementation}",${r.success},"${r.errorType || ''}","${errorMsg}","${elements}","${cs}"`); + lines.push(`"${csvEscape(r.datasetName)}","${csvEscape(r.datasetUrl)}","${csvEscape(r.implementation)}",${r.success},"${csvEscape(r.errorType)}","${csvEscape(r.errorMessage)}","${elements}","${cs}"`); } output = lines.join('\n');
…leniantly enough to accept number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@packages/zarrextra/src/index.ts`:
- Around line 63-70: The loop that walks pathParts uses
getZattrs(normalizedPath, store) for every node, causing parent nodes to receive
leaf attributes; change the attrs lookup to use the current node's prefix path
(build the prefix from pathParts[0..i] or maintain a runningPrefix) and call
getZattrs(prefixAsAbsolutePath, store) so each created node (in the loop that
references pathParts, currentNode, isLeaf, isArray) gets attributes for its own
path rather than the final normalizedPath.
- Around line 173-178: The current regex in the loop over v2Metadata.metadata
(matching flatPath.match(/^(.+?)\/(\.zattrs|\.zarray|\.zgroup)$/)) requires a
path prefix and thus skips root keys like ".zattrs" or ".zgroup"; update the
matching logic so the path prefix is optional, capture the metadata type into
metadataType, and when the captured path is missing set path to the root (e.g.,
empty string) before using pathGroups; adjust the conditional that creates/uses
pathGroups to handle the root path as well so root attributes are preserved.
In `@packages/zarrextra/src/zarrSchema.ts`:
- Around line 170-178: In validateAndConvertV2Zarray, the code uses the 'in'
operator on zarray which will throw for null/non-object inputs; first guard
against non-objects by validating that zarray is a non-null object (e.g., typeof
zarray === 'object' && zarray !== null) before casting and computing hasV3Fields
/ hasV2Fields, and if the check fails throw or return a clear validation error;
update the early code around the hasV3Fields/hasV2Fields checks to use that
guard and reference validateAndConvertV2Zarray, obj, hasV3Fields, and
hasV2Fields when making the change.
- Around line 121-132: v3ZarrayBaseSchema currently omits the array-level
attributes field so validateV3Zarray drops any incoming metadata; add an
attributes entry to v3ZarrayBaseSchema (e.g., attributes:
z.record(z.unknown()).optional().nullable()) and then update validateV3Zarray
(the validator/normalizer logic that builds the output object) to preserve the
incoming attributes instead of defaulting to {} so array-level attributes from
the input metadata are retained.
In `@python/scripts/validate_datasets.py`:
- Line 104: The call currently double-quotes the json.dumps result causing
invalid syntax; change the line using sd.read_zarr so it does not wrap
json.dumps(...) in extra string quotes — e.g. replace
sd.read_zarr("{json.dumps(dataset['url'])}") with
sd.read_zarr({json.dumps(dataset['url'])}) (or simply
sd.read_zarr(dataset['url']) if you want to pass the raw value); update the
sd.read_zarr call that references dataset['url'] and json.dumps accordingly.
🧹 Nitpick comments (1)
python/scripts/validate_datasets.py (1)
172-172: Remove unused variablee.The exception variable is captured but never used.
♻️ Proposed fix
- except (json.JSONDecodeError, IndexError) as e: + except (json.JSONDecodeError, IndexError):
| const v3ZarrayBaseSchema = z.object({ | ||
| shape: z.array(nonNegativeInteger), // Empty array allowed for scalars | ||
| data_type: z.string().min(1), | ||
| chunk_grid: v3ChunkGridSchema, | ||
| chunk_key_encoding: v3ChunkKeyEncodingSchema.optional(), | ||
| fill_value: z.union([z.number(), z.string(), z.boolean()]).optional(), | ||
| codecs: z.array(v3CodecSchema).optional().nullable(), | ||
| dimension_names: z.array(z.string()).optional().nullable(), | ||
| storage_transformers: z.array(z.unknown()).optional().nullable(), | ||
| zarr_format: z.number().optional(), | ||
| node_type: z.literal('array').optional() | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Zarr v3 array metadata attributes consolidated metadata specification
💡 Result:
Zarr v3 array metadata: attributes
In Zarr v3, each array has a zarr.json metadata document. Besides required fields (e.g., shape, data_type, codecs, etc.), arrays may include an optional attributes member:
attributes: a JSON object for arbitrary user/application metadata (keys are strings; values are any JSON literals). The core spec showsattributesinside an array’szarr.jsonexample. [1]
(աբար similarly, groups also have optional attributes in their zarr.json.) [1]
Consolidated metadata in Zarr v3: spec status
There is no finalized, core Zarr v3 consolidated-metadata specification in the Zarr v3 core spec. Instead:
- Zarr-Python supports consolidated metadata for v3 but flags it as experimental, noting that a formal v3 extension has been proposed (referenced as zarr-specs#309 in the docs). [2]
- The zarr-specs repository tracks this as a protocol-extension discussion (“consolidated metadata storage for Zarr v3”), explicitly stating that consolidated metadata storage is not part of the v3 spec and likely needs an extension definition. [3]
Practical takeaway: in v3 today, attributes are standard and live in each node’s zarr.json, but “consolidated metadata” is currently an implementation feature/proposed extension, not a settled part of the v3 core spec. [1][2][3]
Sources: [1] Zarr v3 core specification (metadata / array & group attributes) [2] Zarr-Python consolidated metadata docs (v3 is experimental; points to zarr-specs proposal) [3] zarr-specs issue on consolidated metadata for v3 (extension needed)
🏁 Script executed:
# Find and examine the file
find . -name "zarrSchema.ts" -type fRepository: Taylor-CCB-Group/SpatialData.js
Length of output: 113
🏁 Script executed:
# Read the zarrSchema.ts file to examine the actual code
wc -l ./packages/zarrextra/src/zarrSchema.tsRepository: Taylor-CCB-Group/SpatialData.js
Length of output: 117
🏁 Script executed:
# Read the entire file to see the actual code
cat -n ./packages/zarrextra/src/zarrSchema.tsRepository: Taylor-CCB-Group/SpatialData.js
Length of output: 10539
Add attributes field to v3 array schema and preserve from input metadata.
Lines 121-132 and 259-270: v3ZarrayBaseSchema omits attributes, causing validateV3Zarray to always return attributes: {} and discard any array-level attributes from the input metadata.
Suggested fix
const v3ZarrayBaseSchema = z.object({
shape: z.array(nonNegativeInteger), // Empty array allowed for scalars
data_type: z.string().min(1),
+ attributes: z.record(z.string(), z.unknown()).optional(),
chunk_grid: v3ChunkGridSchema,
chunk_key_encoding: v3ChunkKeyEncodingSchema.optional(),
fill_value: z.union([z.number(), z.string(), z.boolean()]).optional(),
codecs: z.array(v3CodecSchema).optional().nullable(),
dimension_names: z.array(z.string()).optional().nullable(),
storage_transformers: z.array(z.unknown()).optional().nullable(),
zarr_format: z.number().optional(),
node_type: z.literal('array').optional()
}); return {
shape: v3.shape,
data_type: v3.data_type,
chunk_grid: v3.chunk_grid,
chunk_key_encoding: v3.chunk_key_encoding ?? {
name: 'default',
configuration: { separator: '/' }
},
fill_value: v3.fill_value ?? 0,
codecs: v3.codecs ?? [],
- attributes: {},
+ attributes: v3.attributes ?? {},
dimension_names: v3.dimension_names ?? [],
zarr_format: 3,
node_type: 'array',
storage_transformers: v3.storage_transformers ?? []
};🤖 Prompt for AI Agents
In `@packages/zarrextra/src/zarrSchema.ts` around lines 121 - 132,
v3ZarrayBaseSchema currently omits the array-level attributes field so
validateV3Zarray drops any incoming metadata; add an attributes entry to
v3ZarrayBaseSchema (e.g., attributes:
z.record(z.unknown()).optional().nullable()) and then update validateV3Zarray
(the validator/normalizer logic that builds the output object) to preserve the
incoming attributes instead of defaulting to {} so array-level attributes from
the input metadata are retained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@packages/zarrextra/src/index.ts`:
- Around line 63-93: The code calls getZattrs with the full normalizedPath for
every path segment inside the for-loop (for ... of pathParts.entries()), causing
intermediate group nodes to inherit leaf attributes; change this to compute the
per-segment prefix path (e.g. join the first i+1 parts or build incrementally)
and pass that segment-specific path to getZattrs instead of normalizedPath (keep
using normalizedPath or the full path when isLeaf true); update the variable
used for attrs (currently attrs) and any captured path (arrayPath) accordingly
so each node uses its own segment path while leaving leaf behavior unchanged.
- Around line 169-193: The current grouping loop over v2Metadata.metadata drops
root-level entries because the regex in the path matching (used in the block
that populates pathGroups) requires a non-empty prefix before
"/.zattrs|.zarray|.zgroup"; update the match to allow an optional path prefix
(so it matches both ".zattrs" and "some/path/.zattrs"), and when creating the
group key use a stable representation for root (e.g., empty string or '/' as the
path key) so pathGroups contains root entries; also update any downstream logic
(parseStoreContents) to recognize and handle that root key representation when
converting groups to final outputs. Ensure you modify the matching/assignment
logic around v2Metadata.metadata, pathGroups, and the
group.zattrs/.zarray/.zgroup assignments.
In `@packages/zarrextra/src/zarrSchema.ts`:
- Around line 74-163: The v2 and v3 zarray schemas currently reject valid Zarr
"overhang" chunk sizes and disallow null fill values for v2; update
v2ZarraySchema by removing the refine that enforces each chunk <= corresponding
shape (the refine referencing data.chunks.every(...)) and add z.null() to the v2
fill_value union in v2ZarrayBaseSchema (the fill_value definition), and update
v3ZarraySchema by removing the analogous refine that enforces
chunk_grid.configuration.chunk_shape.every(...); keep the other refines (length
equality and dimension_names) intact so only the bounds checks are dropped and
v2 fill_value accepts null.
In `@python/scripts/validate_datasets.py`:
- Line 104: The call to sd.read_zarr wraps json.dumps(dataset['url']) in quotes
producing a double-quoted string and invalid Python; change the argument to pass
the raw URL string instead (use dataset['url'] directly) so the call in the
sdata assignment (sd.read_zarr(...)) receives a proper Python str rather than an
already-quoted JSON string.
- Line 172: The except clause currently captures an unused variable `e` in
"except (json.JSONDecodeError, IndexError) as e:"; either remove the binding so
it reads "except (json.JSONDecodeError, IndexError):" or keep `as e` and use it
in the error handling (e.g., include `e` in the log/raise message) inside the
same try/except block in python/scripts/validate_datasets.py to eliminate the
unused-variable warning and preserve useful debug info.
🧹 Nitpick comments (1)
python/scripts/validate_datasets.py (1)
219-230: Consider centralizing version constants.The versions "0.5.0", "0.6.1", "0.7.0" are hardcoded here, in
argparse.choices(line 344), and in the default versions list (line 401). If versions are added or removed, multiple locations need updating.♻️ Suggested approach
Define a single constant near the top of the file:
SUPPORTED_VERSIONS = ["0.5.0", "0.6.1", "0.7.0"]Then reference it in argparse, the default versions list, and dynamically generate table headers/columns.
| elements=output.get("elements"), | ||
| coordinate_systems=output.get("coordinate_systems"), | ||
| ) | ||
| except (json.JSONDecodeError, IndexError) as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused exception variable.
The variable e is captured but never used. Either remove the binding or use it in the error message for debugging.
🔧 Proposed fix
- except (json.JSONDecodeError, IndexError) as e:
+ except (json.JSONDecodeError, IndexError):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except (json.JSONDecodeError, IndexError) as e: | |
| except (json.JSONDecodeError, IndexError): |
🧰 Tools
🪛 Ruff (0.14.14)
[error] 172-172: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
🤖 Prompt for AI Agents
In `@python/scripts/validate_datasets.py` at line 172, The except clause currently
captures an unused variable `e` in "except (json.JSONDecodeError, IndexError) as
e:"; either remove the binding so it reads "except (json.JSONDecodeError,
IndexError):" or keep `as e` and use it in the error handling (e.g., include `e`
in the log/raise message) inside the same try/except block in
python/scripts/validate_datasets.py to eliminate the unused-variable warning and
preserve useful debug info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/core/tests/schemas.spec.ts`:
- Around line 230-232: The comment on spatialdata_attrs in the shapesAttrsSchema
test is misleading; update the comment next to spatialdata_attrs (in the
shapesAttrsSchema declaration/expected object) to state that this is common
spatial metadata shared across element types (not "Library version metadata (for
raster elements)"), so replace or reword the comment to reflect that
spatialdata_attrs is common metadata across element types and may include
version info like '0.6.1'.
🧹 Nitpick comments (6)
packages/core/tests/schemas.spec.ts (4)
13-13: Clarify or address the TODO comment.The comment "TODO: invert this type..." is ambiguous. Consider either:
- Adding more context about what needs to be inverted and why
- Creating a tracking issue if this is deferred work
- Removing if no longer applicable
117-119: Consider consolidating redundant parse calls.The pattern of calling
expect().not.toThrow()followed by anotherparse()is repeated throughout. You can simplify by parsing once and asserting on the result.♻️ Suggested consolidation
- expect(() => coordinateTransformationSchema.parse(transform)).not.toThrow(); - const result = coordinateTransformationSchema.parse(transform); - expect(result[0].type).toBe('scale'); + const result = coordinateTransformationSchema.parse(transform); + expect(result[0].type).toBe('scale');If the parse succeeds, assertions on the result implicitly confirm it didn't throw. Reserve
expect().not.toThrow()for tests where you only need to verify parsing succeeds without inspecting the result.
248-266: Consider adding a test for points attrs without transformations.For consistency with
shapesAttrsSchematests (which include both with and without transformations), consider adding a similar negative case forpointsAttrsSchema.Also, the comment on line 260 has the same issue as noted above - "(for raster elements)" is misleading for points attrs.
♻️ Suggested additional test
it('should accept points attrs without transformations', () => { const attrs = { 'encoding-type': 'ngff:points', axes: ['x', 'y'], }; expect(() => pointsAttrsSchema.parse(attrs)).not.toThrow(); });
293-322: Consolidate duplicate test cases.The tests at lines 293-313 and 325-343 validate nearly identical scenarios (valid spatial data metadata with affine transformation). Similarly, lines 315-322 and 345-351 both test rejection of numeric version.
Consider consolidating these to reduce redundancy while maintaining coverage.
♻️ Suggested consolidation
Keep the more comprehensive tests and add comments noting they cover cases originally in
index.test.ts:- // Tests from packages/core/src/index.test.ts - it('should validate spatial data metadata (from index.test.ts)', () => { - // coordinateSystems values are arrays of transformations - const validMetadata = { - version: '0.1.0', - coordinateSystems: { - global: [ - { - type: 'affine', - affine: [ - [1, 0, 0], - [0, 1, 0], - ], - }, - ], - }, - }; - - expect(() => spatialDataSchema.parse(validMetadata)).not.toThrow(); - }); - - it('should reject invalid spatial data metadata (from index.test.ts)', () => { - const invalidMetadata = { - version: 123, // should be string - }; - - expect(() => spatialDataSchema.parse(invalidMetadata)).toThrow(); - });The existing tests at lines 293-322 already cover these cases adequately.
Also applies to: 325-351
packages/vis/tests/index.spec.tsx (1)
1-42: Test file structure looks good for API surface validation.The tests appropriately verify that the public exports from
@spatialdata/visare defined and have the expected types. A few minor observations:
- The file extension
.tsxcould be.tssince there's no JSX content.- The test at lines 30-41 partially duplicates checks from lines 11-17 (both verify the same component names).
Regarding the pipeline failure about
@spatialdata/react: this appears to be a build/dependency resolution issue in the react package itself (possibly missing or incorrectmain/module/exportsin itspackage.json), not an issue with this test file.packages/zarrextra/tests/zarrSchema.spec.ts (1)
130-226: Consider adding a test for v3 dimension_names length mismatch.The v2 schema tests include a case for
dimension_nameslength mismatch (lines 118-127), but the v3 schema tests don't have an equivalent. Based on the relevant code snippet showing thatv3ZarraySchemahas the same refinement rule fordimension_names, this edge case should also be tested for v3.📝 Suggested test to add
it('should reject v3 zarray with dimension_names length mismatch', () => { const invalidZarray = { shape: [100, 200], data_type: 'float64', chunk_grid: { name: 'regular', configuration: { chunk_shape: [10, 20] }, }, dimension_names: ['y'], // Wrong length }; expect(() => v3ZarraySchema.parse(invalidZarray)).toThrow(); });
| spatialdata_attrs: { | ||
| version: '0.6.1', // Library version metadata (for raster elements) | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading comment about version metadata scope.
The comment states "Library version metadata (for raster elements)" but this test is for shapesAttrsSchema, not raster. spatialdata_attrs is common metadata across element types.
📝 Suggested fix
spatialdata_attrs: {
- version: '0.6.1', // Library version metadata (for raster elements)
+ version: '0.6.1', // Library version metadata
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| spatialdata_attrs: { | |
| version: '0.6.1', // Library version metadata (for raster elements) | |
| }, | |
| spatialdata_attrs: { | |
| version: '0.6.1', // Library version metadata | |
| }, |
🤖 Prompt for AI Agents
In `@packages/core/tests/schemas.spec.ts` around lines 230 - 232, The comment on
spatialdata_attrs in the shapesAttrsSchema test is misleading; update the
comment next to spatialdata_attrs (in the shapesAttrsSchema declaration/expected
object) to state that this is common spatial metadata shared across element
types (not "Library version metadata (for raster elements)"), so replace or
reword the comment to reflect that spatialdata_attrs is common metadata across
element types and may include version info like '0.6.1'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@vitest.config.ts`:
- Around line 1-2: The config currently relies on the CommonJS __dirname which
is undefined in ESM; replace __dirname usage by deriving the directory from
import.meta.url using fileURLToPath (from 'url') and then use resolve (or
path.resolve) with that directory; update any places referencing __dirname
(e.g., where resolve(__dirname, ... ) is used) to use the computed dirname
variable so Vitest config (defineConfig) runs under ESM without ReferenceError.
🧹 Nitpick comments (2)
python/scripts/validate_datasets.py (2)
219-220: Consider extracting the version list as a constant.The version list
["0.5.0", "0.6.1", "0.7.0"]appears hardcoded in multiple places (table header and detailed results iteration). This creates maintenance overhead when adding new versions.♻️ Optional refactor to centralize versions
+SUPPORTED_VERSIONS = ["0.5.0", "0.6.1", "0.7.0"] + # In generate_markdown_table: - lines.append("| Dataset | v0.5.0 | v0.6.1 | v0.7.0 | URL |") - lines.append("|---------|--------|--------|--------|-----|") + version_headers = " | ".join(f"v{v}" for v in SUPPORTED_VERSIONS) + lines.append(f"| Dataset | {version_headers} | URL |") + lines.append("|---------|" + "--------|" * len(SUPPORTED_VERSIONS) + "-----|")Also applies to: 251-251
341-347: Version list is duplicated across the codebase.The version list appears in
argparse.choices(line 344),main()default (line 401), andgenerate_markdown_table(). Centralizing this as a module-level constant would improve maintainability.Also applies to: 400-401
…se zarr chunk separator, add v3 dtype schema.
…hema, improve validation error handling for zarray metadata. Removed redundant chunk dimension checks.
zarrextrahas been updated to process what I think we understand zarr v3 consolidated metadata to be. Along with this, the schema used forRasterAttrshas been changed so it should support outputs from spatialdata 0.6 - at least in as far as{ ome: { multiscales: [] } }vs{ multiscales: [] }, I may well be missing any number of other things.Should fix #4
Some fairly extensive testing has been added.
uvenvironments for SpatialData0.5.0,0.6.1&0.7.0. Not necessarily intending to keep having a whole matrix of every version in this way.spatialdata.datasets.blobsfrom those two versions of python library.On the JS side, this testing could probably be expanded to include walking each element and checking that it can actually load. Don't want too many massively heavy tests in CI, but some version of that with
blobs... pending getting all of those tests to actually work properly in CI environment.Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores